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1. INTRODUCTION 

The past years witnessed an urgent need to increase the profits of and competitions between compa- 
nies.so, companies have resorted to making quick and accurate decisions. Data warehouses (DWs) represent a 
modern technology that helps analyze and collect data and make appropriate decisions that are in line with the 
requirements of the work environment [1]. DW is a computer system designed to serve as a central repository 
of information generated from a set of data sources that represents the database application used to collect and 
analyze the historical information of an institution resulting from daily operations [J], [2]. Data flows from 
relational databases, such as from transactional systems to the target DW consist of multiple species of data, 
namely, structured, semi-structured and unstructured. These various data are subjected to processing, standard- 
ization and uploading and then employed in business intelligence. Users, such as business analysts, decision 
makers and data scientists, rely on various tools such as structured query language (SQL) clients, spreadsheets 
and business intelligence tools, to access data that are processed in DW [I]. 

In any organization, DW acts as a decision support system, and it relies on the historical data of the 
institution. Decision-makers’ conclusions and decisions are primarily based on the results of the data analyzed. 
DW is used within the public domain across specific field [3]. DW was described by Inmon Bill as a time- 
variant, subject-oriented, non-volatile and integrated collection of data [1]. The key features of any DW are 
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discussed as: 


— Subject oriented: the first characteristic of a DW is that it is subject-specific and organized according to 
different commercial visions; in any DW, data can be analyzed in custom to any topic field [I], [3]. 


— Integrated: data are collected from multiple heterogeneous sources. In the DW environment, all these 
sources are combined and stocked in a mutual place [4]-(6]. 


— Non-volatile: all data in DW remain fixed; once then they are loaded and not permanently deleted as 
long as the historical data are relied upon [4], [5]. 


— Time variant: it is important to maintain historical data where historical positions of activities in time 
DW load. The data specified for each company have a retrieval limit, and data are retrieved according to 
the recovery limit from a period of three months to 10 years [7|-[9]. 


After introducing DW and its most important characteristics in this section, section 2 surveys some 
related studies. Section 3 presents DW applications and overview of the DW architecture. An explanation 
of the data extraction, transformation and loading (ETL) process of the proposed system is given in detail in 
section 4. The experiment results are described in section 5 and the conclusion is presented in section 6. 


2. RELATED WORK 

The DW and ETL have been adopted in many applications. A lot of research has been presented in 
this field. The following paragraphs provide details about recent studies and applications applied DW and ETL 
techniques. 

In [2019] Jayashree and Priya proposed DW design principles for the supply chain. In a supply chain’s 
environment, visibility is the most important factor because it helps reduce errors to achieve reliability for 
clients by the company. Visibility deals with a set of methods and strategies, including inventory, coordination 
and allocation strategies, that any company can employ to provide excellent service to its customers anywhere, 
any time and in the manner required. The advantages of the supply chain visibility are mostly realized as 
cost reductions. The case study for the proposed DW designed for order life cycle visibility involves sales 
representatives. It can reduce the time that sales representatives spend on pulling order information because 
they can access open orders by using mobile devices and emails. As a result, sales representatives can have 
more time to sell and provide customers with the information needed for order shipments. For the supply chain 
market, a previous case study presented the design architecture for order visibility. Logical data were employed 
to explain the systematic procedure of ETL series jobs. These data can be used in any order-processing system, 
such as the supply chain field BB}. 

In [2019] Homayouni ef al. explained and developed a bank model. The developed bank model 
helps determine the reserve requirements of a bank, thereby helping increase customers’ capabilities to obtain 
loans according to the bank’s reserves. Loans are granted to customers. When they depend on the reverse, the 
performance of each branch is known, and the steps required to improve performance are determined depending 
on workhouse data reports, which also help the bank define a set of parameters to obtain abundant knowledge 
from the bank’s DW [4]. 

In [2017] Fatima et al. recommended two approaches to access ideal applications for constructing 
an information distribution center for colleges. The first one was ’Kimball approach (bottom-up)”. It was 
recommended because of its high responses to customers’ requests. The customers’ needs are well developed 
from the beginning stage, which is the most accurate stage and characterized by the implementation of high 
demand. In this approach, data mart reports and analysis are created first then combined together to be the 
DW. This is a test for most relationships because client fundamentals and customers’ requirements change 
often. In the second approach, the ’Inmon approach (top-down design)”, a relational data model collected from 
different resources is created first. The data pass through ETL stages. Afterwards, the dimensional data marts, 
reports and applications that contain the data needed for custom business processes or specified departments 
are outputted by the DW. The Kimball approach considers dimensional models, such as star schemas, that use 
dimensional tables and fact tables to order data in dimensional DWs, whereas the Inmon approach uses data 
mart as a separation between ETL and the final data [10]. 

In [2017] Khan ef al.: in this study, an illustration of a query cache method was used to design an 
optimized DW before ETL processing, improve the developed design of the DW, and reduce the level of errors. 
The data coalition rule was used to identify errors, inappropriate data and defects. The purpose of this method 
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is to store queries and the outputs related to them [11]. 


In [2017] Santoso and Yulia: in this study, the design and implementation of a useful educational DW 
for a higher education institution represented by the University of Basra in Iraq were presented. The proposed 
system was implemented based on two simulated databases, each of which depended on different constraints 
of ETL systems according to the type of data. The simulated databases were collected in two different periods 
from two different institutions: from the Department of Computer Science at the College of Science, University 
of Basra, for the last 10 years and from the University of Iraq for the past four years; then, they implemented 
the databases by using SQL server data tool (SSDT) 2012 and SQL Server 2014 [12]. 


The current study aims to design a DW system that contains a web ETL process that divides the work 
between the input device and server. The first step begins in the user’s device to correct errors, where the 
input data are determined from different sources. Then, cleaning rules are selected, and cleaning preprocessing 
begins to detect and remove errors, standardize data formats and improve data quality. The data resulting from 
the cleaning processes are saved. Afterwards, the second step is performed to ensure non-repetition data. The 
data table is sent to a server, and a comparison is performed between the intermediate table and DW in the 
server. If data are not available, they will be loaded from Interim Table to DW in the server. This system avoids 
the problem of material depletion in the sub-warehouse, where the city warehouses that control the stock in the 
cities are linked to the general warehouse of the governorate’s global store. 


3. DW’S APPLICATIONS 


DW technologies are employed in sundry fields for their efficient ETL methodology. DW has been 
widely used in various fields, where any organization or industry needs to conduct data analysis to determine 
the level of development and achieve growth [11], [13]. DW plays an active role in daily life and many other 
fields, such as medicine, business, banking, finance, web marketing, market segmentation and manufacturing, 
that consist of many stages, including process design, product planning, production and scheduling. Long-term 
strategic issues and profitability are closely related to and influenced by the decisions made, and many industries 
need decision-making systems [14]. DW technology is better than traditional decision-making techniques 
because it relies on different applications to collect, consolidate and store data, leading to the expansion of 
operations and increased efficiency. Notably, analyzing data in separate applications consumes much time. 
At this stage, to perform some manufacturing routines, transaction processing systems are used, and these are 
updated in a timely manner [15]-[17]. In this section, the most important issues related to ETL are addressed in 
terms of construction, design and stages. On this basis, an ETL is built, and the benefits of ETL are highlighted. 


3.1. ETL process 


The ETL process refers to a sequential set of operations that includes extracting, transforming and 
loading data [18]. ETL process is the core and backbone of the DW architecture [19]. It aims to standardize 
the processing of the source data and make the data available and appropriate for the intended use [7]. The role 
of ETL is to extract data from different sources, such as relational databases, flat files, web pages or any other 
type of data. The quality of data is consistent and good [I], [20], [21]. The separated resources are combined 
and used together to build a model for making strategic business decisions and to answer critical business 
intelligence queries [22]. Such data is processed, modified and subsequently loaded into another database [23]. 
The last pattern of data is the friendly user [24]. Web ETL is designed to communicate and interact with 
databases by reading diverse file formats within the organization [9]. Web ETL is a complex task that involves 
extracting many different types of data [11]. The extracted data go through an intermediate stage in which the 
data are scrubbed and cleaned to remove any inappropriate data and eliminate abnormalities (special characters 
and duplicate data) as well as to fill the missing values and attributes, identify and remove outliers, smooth 
and level the noisy data and determine and settle inconsistencies in transforming these data as needed [21], [B]. 
Business transformation rules (like applying calculations and concatenations) are then applied to the cleansed 
data, which are subsequently arranged consistently and taken for loading into the target DW [18]. Finally, the 
loaded data are made available for utilization. Such data can be generated and used as a part of the ETL process 
in several sources, including enterprise resource planning (ERP), centralized server application, document or 
an Excel spreadsheet and customer relationship management (CRM) [2], [25]. 
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3.2. Stages of the ETL process 

The ETL process involves three phases. These phases represent the main steps of our system. They 
should be implemented in a consecutive way. The phases are described in sequence as: 

Extraction: external data are extracted from various source systems to an intermediate stage [3], [19]. 
In this stage, transformations are conducted to ensure that the performance of the source system does not 
deteriorate. In the staging layer, the data extracted from different sources undergo full validation before being 
loaded into DW to ensure that no corrupted data exist at the source because the procedure of retracting the data 
after loading from damaged data is a challenge. As a part of the ETL process, DW provides integration among 
systems that contain different hardware [17]. Before the data are extracted and physically loaded, logical 
mapping must be conducted on the data. Data mapping defines the relationships between the source and the 
target. The following shows examples of data extraction techniques: i) complete charge extraction, ii) partially 
extract load - without any update notification, and iii) partially extract load - accompanied by a notice. The 
following shows examples of the types of validations for data extraction: i) data mapping from source to target 
records, 11) removal of unwanted data, iii) verification of data kinds, and iv) removal of duplicate and split data 
(9). (10). 

Transformation: the data extracted from the source layer are raw and not directly usable, so they 
must undergo cleaning and optimization so that they can be converted and easily dealt with. In this step, a 
set of functions or rules is applied to some extracted data, with the exception of some data that do not need 
conversions (direct transfer or pass-through data). During the transformation step, many customized operations 
are performed on the data. An example is when the end user needs the total sales income and it is not available 
in the database [22], [26]. 

Loading: it is the final step in ETL. After the conversion of data into a specific DW format, the data 
are loaded to the target DW scheme. In DW, data loading is time dependent because a huge volume of data 
is loaded within a short period of time. Developed performance optimization techniques are included in the 
loading process to improve performance. To maintain data integrity, failed data loading should be avoided. 
To ensure data preservation and avoid data loss, recovery mechanisms are prepared [3]. All stages mentioned 
above are clarified in Figure[I] 


Oracle 
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Lan Data Warehouse 
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Flat file 
ETL Process 
Extract 
Figure 1. ETL process steps 


3.3. Benefits of ETL 

ETL has many benefits to the field it applied to. It has been adopted on a large scale in many institu- 
tions. Here we are listing some of the ETL benefits: 1) one of the most important benefits of using ETL is to 
deal with complex business needs that the transac- tional databases cannot handle; ii) DW is a common reposi- 
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tory for sharing and reusing data; i111) ETL transfers data to DW from different sources; iv) the modifications on 
the sources of data (e.g., adding, deleting and updating) are made automatically in DW; and v) ETL allows for 
integration between the sources of data and the final system because it provides a comparison of data samples 
with the source and the final system. 


4. PROPOSED SYSTEM 

This section presents additional details of the proposed system. It also shows the design of the DW 
that contains a we ETL process that divides its work between the input device and server. The proposed system 
solves the lack of work partitioning between the input and device and server. 


4.1. General description of the proposed system 


The proposed system is based on setting a DW in each city. The warehouse maintains the provision 
of a minimum level of paint products in each city distributed among the sales centers. The warehouses of 
the cities controlling the stock in the cities are connected to the general warehouse, which is connected to the 
global warehouse of the governorate. When ordering paint materials, the salesman supplies customers with 
products available at his center. The proposed system transfers the orders to other sales centers in the city in the 
event that any product reaches the minimum stock level. The DW within the city prepares weekly reports for 
administration officials. The reports contain materials whose stock has reached two folds the minimum level 
for their preparation. The reports also show what paint materials were spent during this week in each center. In 
addition, the reports include sale prices and profits arising from the sale. The most important thing the reports 
focus on is the colors spent in each city, their type and their use. City warehouse officials request to supply the 
shortage of paint materials from the governorate warehouse. the county store officials are provided with reports 
for each city. The city warehouse report contains the consumed materials, the desired colors, their quality and 
their use. It also contains the selling prices and profits obtained. Figure 2|illustrates scenario of data movement 
and coating material in the proposed system. 


Painting Warehouse in the Govemorate 


Figure 2. Scenario of proposed system 
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Within the proposed system, there is an administrator in each city and an administrator of the gover- 
norate warehouses. Therefore, the responsibilities of administrators differ according to the connection to the 
data warehouse that they deal with. The county store official enters the materials’ data and prices according 
to the important information. In addition, it distributes dyes to cities according to their consumption. The city 
administrator distributes the dyes he received to his sales centers. 


4.2. Data sources 

The system has been tested on two different datasets. These datasets has been collected in two different 
ways. The process of collecting these datasets is explained as: the first source collects data from five offices 
and every office includes four centers to sell dyes in different areas in Al Anbar Governorate. The data sources 
differ according to the system or application used, such as spreadsheet, access database and data FoxPro) and 
the total number of data fields from the five offices in the various sales centers is approximately 5,809,600. 
Manual data entry is the second source, and the total number of data fields from this source is 1,691,611. The 
data sources used are varied. Data collection is conducted in eight months. After data collection, a DW has a 
total of 7,501,211 fields. 


4.3. Web with ETL 

The first stage of the ETL process extracts data from sources. The data extracted are incomplete and 
not usable in their original form, so after the extraction of all data from these sources, the data are forwarded to 
the second stage, which includes cleaning the input data (e.g., correcting spelling mistakes in entering data) and 
transforming them to the standardized form. Such transformation refers to the color name and date of product 
input in more than one form. For example, 01-01-2020, 01 January 2020 and 01/01/2020, should be unified into 
one form. All data after preprocessing and cleaning are aggregated in an interim table (ITbl), which represents 
the data staging area. Then, non-repetition preprocessing begins to ensure that the data are not duplicated. 
This table sends the match between the data in ITbl and the data in the server to the server to test whether the 
data have been previously entered or not. Data are loaded from ITbl to DW if they are unavailable. The final 
step of the ETL process a multidimensional DW is made through a star scheme with the data uploaded within 
a special server purchased from the site smarterasp.net, Which is characterized by high security by using the 
rivest-shamir-adleman (RSA) Algorithm [I]of web ETL. 


Algorithm 1 Web-ETL design 


Require: Data form input devices 
Ensure: Cleaning data then Loading to Data Warehouse (DW) on web 
In User Device: Determine input data and the cleaning rules are selected. 
Start Cleaning Pre-processing. 
Determine values from input with attributes of DW Values € DWinServer. 
Testing the mistakes and missing in input data. 
The appropriate logical methods use to correct errors and missing of data. 
End Begin. 
Create interim table (ITbl), ITbl The data after cleaning. 
Begin Non-repetition Preprocessing. 
Connected with DW in the Server. 
Sending ITbl to Server. 
if (Data duplication between ITbl and DW = Yes) then 
Return Message (“Data are already entered”) to User Device. 
else 
Return Message (“Data are not entered”) to System on User Device. 
end if 
End Begin. 
if (System Receive Message) then 
Loading data from ITbl to DW. 
end if 
Close DW and Disconnect. 
In User Device: Show the message “Adding data successfully”. 


4.4. Star scheme of the DW in the proposed system 
DWs are multidimensional databases used in analytical processing. According to the types of DW 
scheme DW design can rely on the star data warehouse scheme in which the center represents the fact table. 
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It contains the ID of each of the other dimensions (three dimensions), fact table control and distributed data 
between tables. It also contains many attributes e.g., (date_in, time _in). Our DW scheme includes the following 
attributes, which are shown in Figure[3| 


Authorized Table 


auth_numberID | Fact Table 
auth_numberID 


Materials Table 


auth_center_name m_RQcodeID 


auth_place exp_numberID m_name 


auth phone m_RQcodeID 
m_RQcode 


m_ quantity 


m_prod_date 


auth_number m_exp date 


Exports Table exp_number 


m_prod_place 


exp_numberID date_in m_quality 


exp_ quantity time_in m_color 
exp_date m_purch_price 


m_ supplier 


Figure 3. Data warehouse scheme 


— Fact table ID, m_RQcode, auth_number, exp_number, date_in, time_in). 


Materials table (m_RQcode, m_name, m_quantity, m_prod_date, m_exp_date, m_prod_place, m_quality, 
m_color, m_purch_price, m_supplier). 


— Authorized table (auth_number, auth_center_name, auth_place, auth_phone). 


Exports table (exp_number, exp_quantity, exp_date). 


4.5. Interface of the proposed system 

The proposed system uses a mobile phone’s applications programmed in Java language that works on 
android devices. This application is used by the owners of sales centers spread in cities. Barcode reader devices 
are also linked to mobile phones because they are dependent on the work. The proposed system uses program 
language C# from Visual Studio 2017. In addition, it uses (active server pages) ASP.net graphics interfaces 
for city and county warehouse administrators; this is attributed to the existence of numerous settings and wider 
workspace. Internet browsers on computers Figure 4]and smart devices Figure [5]can view them without any 
need to have special devices. This system distributes to the same office and its cell center. 


5. | RESULTS AND DISCUSSION 
The system has been evaluated depending on three factors. These factors are widely used to show the 
performance of the system. The following sections explain all the details about these factors. 


5.1. Execution time for ETL 

Time is one of the most important factors that are considered to test the efficiency of any application. 
The proposed system is fast. At the data entry stage, the system does not consume much time. The data 
presented are characterized by speed that meets the needs of decision-makers and conforms to their conditions. 
At the analysis stage, a few seconds are used to extract the results and make the most appropriate decision. 
Table[I] clarifies these statements. 
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Figure 4. Computer interface for application 
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Figure 5. Mobile interface for application 


Table 1. Execution time for ETL 


Subject Fields Time in seconds 
ETL without missing Some 0.01570 
All 0.00582 
ETL with missing Some 0.01816 
All 0.03427 
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5.2. 


Speed data exchange between DWs 


The speed of data exchange between the central DW (mother DW) in the governorate and the sub- 


warehouses (small DW) in each city is calculated, as illustrated in Table [2| The speed calculation is based on 
three data sizes of 10, 50 and 100 kB under the premise that the speed of internet used is constant on all sides. 
The internet speed is 512 download and 128 upload Figure[6] 


5.3. 


Table 2. Speed data exchange between data warehouses 


From To DatainKB Time in seconds 
Main input data Mother DW 10 0.00298 
Main input data Mother DW 50 0.00735 
Main input data Mother DW 100 0.01098 
Mother DW Small DW 50 0.01163 
Mother DW Small DW 100 0.04710 
Mother DW Small DW 150 0.09026 
Small DW User interface 10 0.02196 
Small DW User interface 50 0.06004 
Small DW User interface 100 0.09714 


Time in Seconds 


0.08 
0.06 
i. 0.04 
' 0,02 
ae am — , 


100KB 5SOKB 10KB 150KB 100KB 50KB 100KB S5SOKB 10 KB 


Data in KB 


Figure 6. Time speed of data exchange between DWs 


System evaluation 


The system provided good results. However, it’s not possible to proof its accuracy without an evalua- 


tion that shows how this system is doing. In this section we show the performance of our system. The system 
has many features and they are summarized as : 


Validity (business rule conformance): the proposed system is characterized by its conformity with work- 
ing conditions of other institutions, where it can be implemented on different data systems whilst taking 
into account the needs of the institution with a slight change in the names and number of columns. 


Interactive with the user: the interfaces of the system are easy to use, clear to the user and very under- 
standable. The user’s interaction with these interfaces will lead to his interaction with the algorithm of 
the proposed system. 

Performance: the proposed system is characterized by its high performance in entering data, analyzing 
data and building queries by using a web server, so it can be used in many other disciplines and branches. 
Accuracy: The best display options can be relied on to improve the accuracy of queries. Queries are built 
with a new resolution technology that deals with web options and is the best way to improve performance. 
Ease of implementation: the proposed system can be implemented using other programming languages, 
which provides flexibility in implementing the system. 
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6. CONCLUSION 


ETL is a major and vital part of DW. It fetches data from its main database to the destination DW. In 
this study, a mobile phone application representing a DW with web ETL is designed. The proposed system is 
based on two DW. The main DW (painting warehouse) is in the governorate, and the sub-warehouse in each 
city. The warehouse maintains the provision of a minimum level of paint products in each city distributed 
among the sales centers. The efficiency of this application in terms of speed of response to demand, speed of 
performance, interaction with the customer, ease of implementation and accuracy is proven in this work. In the 
future, the proposed system will be used in data analysis. 
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